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We present a solvable model that predicts the folding kinetics of two-state proteins from their 
native structures. The model is based on conditional chain entropies. It assumes that folding 
processes are dominated by small-loop closure events that can be inferred from native structures. 
For CI2, the src SH3 domain, TNfn3, and protein L, the model reproduces two-state kinetics, and it 
predicts well the average <!>-values for secondary structures. The barrier to folding is the formation 
of predominantly local structures such as helices and hairpins, which are needed to bring nonlocal 
pairs of amino acids into contact. 
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I. INTRODUCTION 



Protein folding kinetics is usually modeled in either 
of three ways. First, there are mass-action models 
that capture the amplitudes and decay rates of the 
exponentia ls in the folding or un folding relaxation 
process llkai and Tanford 197ll iTsong et al. 19711 



iDill and Chan 19971 lEnglander 200ff 



Mass-action 

models are useful for cataloging the different types of 
kinetic behavior, but give no insight into molecular 
structures or mechanisms. Such models do not predict 
other experimental properties, such as ^-values. Sec- 
ond, there are all-atom or lattice model simulations 
that can explore sequence-stru c ture relationships (see, 
e.g.. iDuan and Kollman 19981 IShea and Brooks 20011 
iDaggett 20 02] j. They are usually limited by compu- 
tational power to short time scales and to studying 
restricted conformational ensembles. Third, between 
these macroscopic and microscopic extremes, another 
type of model has recently emerged. This class of 
models uses knowledge of the native structure to infer 
the sequences of fold i ng events |Mjrnrjz^mciJj]ator^T99S , 
Ahji^mci^akfir - lJ)99|. [G alzitskav aanrJF^inkejste^l^ggS , 



Debe and Goddard 19991 



Clementi ct al. 2000, 
Li and Shakhnovich 200l 



Ivankov and Finkclstein 2001 



Klimov and Thirumalai 200S 
Bruscolini and Pclizzola 2002 



ISh^eT nakeT^t^iL*T99S 
iHoang and Cieplak 20001 
iPortman et al. 2001 
~lAlm et al. 20*02 . 



Bruscolini and Cecconi 2002 



iFlammini et al. 2002 , 
iMicheelsen et al. 2 003] . 
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Some of these models define partially folded states with 
one or t wo contiguous sequences of native-like ordered 
residues iMunoz and Eaton 19991 lAlm and Baker 19991 
lAlm et al. 20021 iGalzitskava and Finkelstein 1999*f 
Others are based on a Go-model energy func- 
tion t hat enforces the glo ba l stability of the native 
state | C lementi et al. 20001 IHoang and Cieplak 2000l 
iLi and Shakhnovich 200l| . 

We describe here a folding model of the third type. 
Our model uses knowledge of the native structure to 
predict the kinetics. However, it differs from previous 
models in several respects. First, our model focuses 
on chain entropies and estimates loop lengths from the 
graph-theoretical concept of effective contact order ECO 
(see below). We follow time sequences of loop-closure 
events because we expect that these events reveal how 
the kinetics is encoded in the native structure. Wc 
assume that folding proceeds mostly through closures 
of small loops, and that large-loop closures are much 
slower and less important processes. Second, our model 
focuses on contacts within the chain, not on whether 
residues are native-like or not IMunoz and Eaton 1999, 
I Aim and Baker 19991 lGalzitskavalffl*d**F7nkfi^ 
because we think the formation of contacts is a more 
physical description of the folding process. Therefore, 
in our model partially folded states are characterized by 
formed contacts, not by contiguous stretches of native- 
like ordered residues as in other simple models. Third, 
the folding kinetics is described by a master equation 
that can be solved directly for the macrostates considered 
here, without stochastic simulations such as molecular 
dynamics or Monte Carlo. Hence the present treatment 
can handle the full spectrum of temporal events. 

The prese nt work is related to a recent model o f pro- 
tein zipping ^^^^200MM^^^200^. Our 
fundamental units of protein structure are contact clus- 
ters. A contact cluster is a collection of contacts that 
is localized on a contact map, corresponding roughly 
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to the main structural elements of the native structure. 
Examples of contact clusters are turns, a-helices, 0- 
strand pairings, and tertiary pairings of helices. A cen- 
tral qu antity in our models is the effective co ntact order 
fECOl lFiebig and Dill 19931 iDill et al 1993j| . The ECO 
is the length of the loop that has to be closed in order to 
form a contact, given a set of previously formed contacts 
or contact clusters. The premise is that the formation of 
the nonlocal contact clusters requires the prior formation 
of other, more local, clusters. 

Our model predicts average <!>-values for secondary 
structural elements that are in good agreement with 
the experimentally observed values for several two-state 
proteins. It shows that <E>-value distributions can be 
understood from loop-closure events that are defined 
by the native topology of a protein. The importance 
of topology for routes and <&-v alues has also been pre- 
viously noted by oth er groups iMunoz and Eaton 199S , 
[ Aim and Baker 19991 lAlm et al. 2002 . 

IClementi et al. 2000HVendruscolo et al. 2001^" 

To compute the dynamics, we use a master equa- 
tion. Several previous studies of the folding kinetics 
of lattice heteropolymer models h ave also been based 
on master equati on methods |L^cj30^o^^d^L992 , 
Chan and Dill 19931 [ Cieplak et al. 199S , 

Ozkan et al. 200ll lOzkan et al. 20021 lOzkan et al. 2003 , 
Schonbrun and Dil| . These methods have the advantage 
that they require no ad hoc assumptions about what 
the transition state is. The transition state emerges 
in a direct physical way from the solution to the 
master equation. However, the lattice models are too 
simplified to treat specific amino acid sequences or 
specific protein structures. Lattice models focus on 
transitions between microstates, the individual chain 
conformations, since these are the fundamental units of 
structure in such models. Our present master equation 
describes transitions between macrostates, defined by 
the contact clusters of a given protein structure. In this 
way, the present model aims to make closer contact with 
experiments. 



II. THE MODEL 

Contact clusters 

To compute the folding kinetics, we start with the 
native contact map, the matrix in which element 
equals 1 if the residues i and j are in contact, and equals 
otherwise. Two residues are defined as being in contact 
if the distance between their C a or Cp atoms is less than 
6 Angstroms. 

Next, we divide the native contact map into contact 
clusters. Each contact cluster corresponds to a struc- 
tural element of the protein. Two contacts and (fc, I) 
are defined as being in the same cluster if they are close 
together on the contact map, according to the distance 
criterion that \i — k\ + \j — l\ < 4. We define two types 
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FIG. 1: Native contact maps and contact clusters for CI2 and 
the sre SH3 domain. 



of clusters: local and nonlocal. Clusters are local if they 
contain at least one local contact (i,j) having contact 
order CO = \i — j\ < 6. Local clusters include helices, 
turns, or /3-hairpins, for example. A cluster is nonlocal if 
it has no local contacts; examples include /3-strand pair- 
ings other than hairpins, and the tertiary interactions of 
helices. To qualify as nonlocal, a cluster must also have 
more than two contacts; isolated nonlocal contacts are 
not considered to be clusters. Similarly, we do not con- 
sider as contributing to clusters any 'peripheral' contacts 
(i, j) with a minimum distance \i — k\ + \j — 1\ = 4 to the 
other contacts in the cluster. In general, typical contact 
maps have only a few isolated nonlocal or peripheral con- 
tacts. Fig. n shows examples of clusters, specifically for 
chymotrypsin inhibitor 2 (CI2) and the sre SH3 domain. 
By our criteria, CI2 has 5 local clusters and 2 nonlocal 
clusters (0203 and P1P4), and the sre SH3 domain has 6 
local clusters and 2 nonlocal clusters (RT-/?4 and PiPs). 
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States and free energies 

We assume that each cluster is either formed or not; 
we neglect partial degrees of formation. Thus, for a 
protein with M clusters, there are 2 M possible states 
that describe the progression to the native state. Each 
of these macrostates is characterized by a vector n = 
{ni, n,2, ■ ■ ■ , Um}, where n% = 1 indicates that cluster i is 
formed and rij = indicates that cluster i is not formed. 

In our model, the free energy of the protein as a func- 
tion of the state n of cluster formation is given by: 



M 



[c-4(n)+/„] 



(1) 



Each cluster i that is formed (m = 1) contributes to 
the free energy F n of the state n with two terms: A 
state-dependent free energy of loop closure c • £i{n) ('ini- 
tiation' free energy), and a free energy fa for forming the 
cluster contacts ('propagation' free energy). Here, c is a 
loop-closure pa rameter. The quan tity £j = £i(ri) is the 
initiation ECO |Weikl et al. 2003"a) for cluster i. The ini- 
tiation ECO of a cluster is the length of the smallest loop 
that must be closed in order to form that cluster from the 
other existing clusters. For a local cluster, the initiation 
ECO is the smallest CO among the contacts. For a non- 
local cluster, the initiation ECO depends on the presence 
of other clusters in the state n. 

In general, the initiation ECO also depends on the se- 
quence through which those clusters are formed. How- 
ever, in order to apply the master equation formalism, 
we need a free energy and thus we require a definition 
of initiation ECO that is only a function of state. For 
this purpose, we use the following scheme. If only one 
nonlocal cluster is formed in a certain state, the initi- 
ation ECO of that cluster is the smallest ECO among 
the cluster contacts, given all the local clusters formed in 
that state. If multiple nonlocal clusters are present in a 
state, we consider all the possible sequences along which 
these clusters can form, and determine the one having the 
smallest sum of ECOs. For instance, for a state with two 
nonlocal clusters Cj and Cj, there are two sequences: (1) 
Q -> Cj, and (2) Cj -> C*. The minimum ECOs for the 

clusters are determined sequentially: 



4 and £j along 
sequence (1), and £\ ' and £\ ' along sequence (2). If 



eP +£ { p is smaller than if 1 



p( 2 ) 



the initiation ECOs 



£i and £j of the clusters i and j in the given state are 



taken to be £) ' and £\ . The initiation ECOs L and L 



t>W fO-) 

are an estimate for the smallest loop lengths required to 
form the two clusters in the state. 

In eq. the free energy cost of the loops is estimated 
by a simple linear approximation in the loop length. This 
is not unreasonable since the range of relevant ECOs only 
spans roughly one order of magnitude, from about £ ~ 3 
to £ ~ 30 or 40. In general, determining the free en- 
ergy of a chain molecule with multiple constraints or 
contacts is a complicated and unsolved problem. For 
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FIG. 2: Energy landscape for the sre SH3 domain as a func- 
tion of the 5 major clusters (A) RT, (B) fa 03, (C) fa fa, (D) 
RT- fa, and (E) fafa. Here, BD, for example, means that 
only clusters B and D are formed. The free energies given by 
eq. Jy are shown in blue (the units are fcsT). Red arrows in- 
dicate uphill steps in folding direction, green arrows downhill 
steps. For clarity, states with free energies larger than 4 ksT 
are neglected. 



the simpler problem of hairpin-like loop closures, sev- 
eral estimates have b e en given in the literature (see, e.g., 
IChan and Dill 199Ct iGalzitskava and Finkelstein 19991 
IfrcrnkoT^mT^inkelstein 200 l| h 

In principle, this model could treat the detailed en- 
ergetics of each folding route, if each of the M clus- 
ters were characterized by its own free energy /j. But 
here we consider a simpler version of the model. We 
assume that there are only two parameters for the free 
energy of formation: f L = fi for propagating any lo- 
cal cluster, and fa = f n i for propagating any nonlo- 
cal cluster. To obtain two-state folding and agreement 
with experimental values, we find that // must be 
nonnegative and /„/ must be negative. This is consis- 
tent with the experimental observation that local struc- 
tures, such as helices or /3-hairpins, are generally un- 
stable in isolation. Similar in spirit, the diffusion- 
collision model of Karplus and Weaver assumes that 
microdomains, e.g. helice s, are individually unstabl e 
|Karplus and Weaver 19761 iKarplus and Weaver 1994| . 
Thus, the rate-limiting barrier to folding in our model 
turns out to be the formation of mostly local structures 
needed to reduce the ECOs of nonlocal clusters. The 
driving force for overcoming this barrier is the favorable 
free energy /„; of assembling the nonlocal clusters. 

The predicted free energy landscape of the sre SH3 
domain is shown in Fig. |3 using the parameters // = 
and c = 0.5 fcgT, where ksT is Boltzmann's constant x 
temperature. The value of /„; is chosen so that the equi- 
librium probability that the two nonlocal clusters RT-/54 
and (3i@5 are both folded ('native state') is 0.9, which 
gives /„; = —6.6 ksT for sre SH3. With these param- 
eter settings, we obtain a good agreement with average 
experimental <&-values for the sre SH3 domain and other 
two-state folders (see below) . For clarity, we show in the 
figure only a reduced set of states based on the 5 major 
clusters RT, fafo, /3 3 At, RT-/3 4 , and /3i/3 5 . The three 
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small clusters T, DT, and H have negligible effects on 
the folding kinetics and on the $-values. Only states dif- 
fering by the formation of a single cluster are kinetically 
connected. The uphill steps in this model either are steps 
in which a local cluster is formed, or steps involving high 
ECOs. The downhill steps are steps in which a nonlocal 
cluster is formed with a low ECO, or steps in which a 
local cluster significantly reduces the ECOs of previously 
formed nonlocal clusters. The model predicts two main 
folding routes. Along the upper route (E) (3i/3 5 folds af- 
ter (D) RT-/?4; along the lower route, they form in the 
opposite order. Along these routes, the barriers (highest 
free energies states) are the states in which two clusters 
are formed: BD and BC for the upper route, and AC for 
the lower route. 



Master equation 

In this section, we describe the folding dynamics. We 
use the master equation, 



dP n (t) 
dt 



^ [w nm Pm{t) - W mn P n (t)} , (2) 
rn^n 



which gives the time evolution of the probability P n (t) 
that the protein is in state n at time t. Here, w nm is the 
transition rate from state m to n. The master equation 
can be written in matrix form 



dP(t) 
dt 



-WP{t) 



(3) 



where P(t) is the vector with elements P n (t), and the 
matrix elements of W are given by 



for n =/= m: 



W nn = ^ Wmn - 



(4) 



The transition rates are given in terms of the free energies 
by 



U'r, 



tr. 



1 + exp 



knT 



(5) 



where t a is a reference time scale. The only transi- 
tions that are assigned to have nonzero rates w nm are 
those incremental steps that change the state n by a sin- 
gle cluster unit. This is enforced by the term 5\ n ^ m \ i 
in eq. J5J where the Kronecker 8%^ is one for i = j 
and zero otherwise. The condition \n — m\ = 1 is 
only satisfied by pairs of states n = {ni , . . . , um } and 
to = {toi, . . . ,tom} with rik ^ vn^ for a single cluster 
k, and with rik = rrik for all other clusters. The transi- 
tion rates (J5J satisfy detailed balance, io m P4 = WmnPn 
where P% ~ exp[— F„/(fcsT)] is the equilibrium weight 
for the state n. We have chosen here the 'Glauber dy- 
namics' with w nm - (l + exp[(F n -F m )/(/c B r)]) _1 . An- 
other standard choice satisfying detailed balance is the 



Metropolis dynamics, which should lead to equivalent re- 
sults. 

The detailed balance property of the transition rates 
implies that the eigenvalues of the matrix W are real. 
One of the eigenvalues is zero, corresponding to the equi- 
libriu m distribution, whi le all other eigenvalues are pos- 
itive |van Kampen 199*21 ]. The solution to the master 
equation is given by 



P(i) = X>A y Aexp[-Ai] 



(6) 



where Y \ is the eigenvector corresponding to the eigen- 
value A, and the coefficients c\ are determined by the 
initial condition P{t — 0). For t — > oo, the probability 
distribution P(t) tends towards the equilibrium distribu- 
tion P e ~ Y where Y is the eigenvector with eigen- 
value A = 0. 

Solving the master equation gives a set of 2 M eigen- 
values, each with its associated eigenvector. Each eigen- 
value represents a relaxation rate. As initial conditions 
at t = 0, we start from the state in which no clusters are 
formed. This corresponds to folding from high tempera- 
tures or high denaturant concentrations. 



III. RESULTS 
The cooperativity in two-state kinetics 

The signature of two-state kinetics is the existence of 
one slow relaxation process (described by a single expo- 
nential), separated in time from 2 M — 1 fast relaxations 
(a 'burst' phase). Fig. shows the eigenvalue spectra 
for CI2 and the sre SH3 domain, based on using the pa- 
rameters c = 0.5 ksT, local cluster free energy /; = 0, 
and a nonlocal cluster free energy chosen so that the 
equilibrium 'native' population with all nonlocal clusters 
formed has probability 0.9. The latter condition leads to 
fnl = -7.9 k B T for CI2, and f nl = -6.6 k B T for sre 
SH3. Fig.0]shows the predicted folding dynamics for the 
sre SH3 domain. 

The spectra in Fig. [3] show that for these proteins, 
the eigenvalues do indeed separate into a slow single- 
exponential step and a burst phase, consistent with the 
experimental observation of two-state behavior. The 
slowest relaxation rate Ai is about one order of magnitude 
smaller than the other nonzero eigenvalues (see Fig. |3J). 
At times t > 1/Ai, the probability distribution © is well 
approximated by P(t) ~ cqYq + c{Y \ exp[— Ait] where 
Yq is the eigenvector with eigenvalue 0, which charac- 
terizes the equilibrium state, and Y\ is the eigenvector 
with eigenvalue Ai. 

The typical time evolution of the folding process pre- 
dicted by the model is as follows. We have two time 
scales, t ~ t a and tp ~ 1/Ai. Time t a is a characteristic 
of the burst phase in the model and t^ is the single- 
exponential folding time. At the earliest times, t < t a , 
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FIG. 3: Eigenvalue spectra for CI2 and the src SH3 domain 
in units of l/t where t is the reference time scale for the 
transition rates 



single local clusters start to form: examples are the clus- 
ters A, B, and C of the src SH3 domain, see Fig. [21 
As shown in Fig. on this time scale, each cluster is 
only weakly populated, with a probability less than 10%. 
Any structures having larger-scale organization - cluster 
pairs, triplets, etc. - have negligible populations. At in- 
termediate times, ip > t > t a , there is a crossover from 
the burst phase to the single-exponential folding process. 
During these intermediate times, cluster pairs (AC, BC, 
BD) begin to form. Fig.[2]shows that these pairwise clus- 
ters are the barrier events, i.e., they represent the confor- 
mational states of maximum free energy obtained during 
folding. Finally, on the longest time scale, t ~ ip, the 
pairwise and triplet clusters reach sufficiently high pop- 
ulations to assemble into multi-cluster complexes, pro- 
ceeding downhill in free energy to the native structure. 

What is the basis for the cooperativity of folding in 
our model, i.e. for the separation of time scales? First, 
the formation of local structures in our model reduces 
the loop-closure entropies for the formation of the nonlo- 
cal structures. Second, only the nonlocal structures have 
favorable propagation free energies /j = f n i < 0. Hence, 
the formation of the nonlocal structures stabilizes the 
overall fold, and thus also the local structures. The bar- 
rier arises from the positive free energies in eq. Q due to 
the formation of local structures and loops (see Fig.[2J). 
Interestingly, if we set the free energies for local struc- 
ture formation to be negative by several ksT, we obtain 
fast multi-exponential downhill folding, without a bar- 
rier. Based on experiments and theory, such downhill 
folding has been recently postulated for the protein BBL 
|Oarcia-Mira et al. 2002 |. 

To understand the cooperative folding in the model, it 
is instructive to turn off the loop-closure term in eq. 
by setting c = 0. Then all M clusters are independent 
of each other. In that case, there is no cooperativity. It 
can be shown that the matrix W then has the eigen- 
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FIG. 4: (Top) Time evolution of the formation probability P 
for the major clusters of the src SH3 domain during folding 
(see Fig. . (Bottom) Time evolution of state probabilities 
for the exemplary path -> B -» BC -> BCD -» BCDE -> 
ABCDE of the src SH3 domain (see also Fig.0. The initial 
state at time t — is the denatured state in which none of 
the clusters is formed. 



values A = j/t a where j is an integer between and 
M, the number of clusters. Each of these eigenvalues 
has a population that is given by the binomial coeffi- 
cient j\/\j\{M — j)!]. This gives a broad non-two-state 
spectrum. Hence, the separation of time scales - and 
the two-state cooperativity - arise in this model from 
the coupling of the clusters via the loop-closure term in 
eq. JTJ. 

To see the magnitude of the barrier, note that the fold- 
ing rate Ai is related to the height of the energy barrier 
on the folding landscape. For comparison, consider a 
mass-action model with three states D <-> T <-> N (de- 
natured state, 'transition' state, native state) and tran- 
sition rates as in eq. (JSJ). The folding rate is given, to a 
very good approximation, by (l/2)t~ 1 exp[— F^/iksT)] 
for barrier energies Fl = ff, - F X D > k B T. The factor 
of 1/2 comes from the fact that a molecule in state T 
can jump both to D and N, with almost equal probabil- 
ity, since both sides of high-barrier transition states are 
steep downhills. Now, for the energy landscape of the 
src SH3 domain shown in Fig. [21 the minimum barrier 
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Table 1: Maximum probability Pmax and Yi elements for 
transient states of the src SH3 domain. 



state 


p 

J max 


Yi element 


C 


0.13 


0.21 


B 


0.062 


0.10 


A 


0.047 


0.08 


BD 


0.016 


0.019 


BC 


0.010 


0.017 


AC 


0.007 


0.011 


BCD 


0.015 


0.010 


ABD 


0.004 


0.003 


ACE 


0.004 


0.001 



has free energy 2.4 k B T for state BD. The correspond- 
ing barrier crossing rate of (l/2)t~ 1 exp[— 2.4] is in good 
agreement with the folding rate Ai ~ 0.05/t o (see Fig- EJ - 

Experiments have been interpreted either as indicating 
that burst phases involve structure formation or that 
burst phases are processes of non-structured polymer 
collapse, depen ding on the pro tei n and the experimen- 
tal method lEnglander 200(1 ICallender et al. 19981 
Ouehele et al. 19fl3~ lEatou et al. 199^ 



Parker and Marausee 2000l iFerguson and Fersht 200.'- 
In our model, the burst phase is a process of structure 
formation. Non-structured collapse is beyond the scope, 
or resolution, of our model, because the model has only 
a single fully unstructured state - the state in which 
none of the clusters is formed. The burst phase in our 
model captures fast preequilibration events within the 
denatured state in response to initiating the folding 
conditions at t = 0. In the model, this denatured state 
is an ensemble of macrostates on one side of the barrier 
in the energy landscape (see Fig. [5J. It is reasonable 
to assume that such preequilibration events within the 
denatured state exist also for real proteins. However, 
whether these events can be detected as burst phases 
in experiments should depend on the initial conditions, 
experimental probes, etc. 

During folding or unfolding, certain conformations will 
be populated transiently. If the populations of those con- 
formations a re always small, w e call them 'hidden in- 
termediates' [Ozkan et al. 20021 ]. The population of a 
hidden intermediate conformation rises to a maximum, 
Pmax, then falls as the protein ultimately becomes fully 
folded. The term 'hidden' means that P m ax is always 
small enough that it does not contribute an additional 
kinetic phase; i.e., the folding kinetics is two-state. Here, 
we consider two quantities. (1) We compute P m ax for the 
transient states. For simplicity, we consider only the 5 
major clusters RT, /3 2 /3 3 , /3 3( 3 4 , RT-/3 4 , and /3i/3 5 . (2) We 
look at the elements of the eigenvector Yi , the eigenvec- 
tor corresponding to the smallest eigenvalue Ai. These 
elements show how the various conformations grow and 
decay with rate Ai as folding proceeds. Table 1 shows 



that the maximum population P max correlates well with 
the elements of Yi. For a typical route of src SH3, Fig.0] 
(bottom) illustrates the decay of the denatured state and 
hidden intermediates and the growth of the native state, 
all with rate Ai . 



Average $-values for secondary structural elements 

The effects of a mutation on the folding kinetics are 
often explored through experimental measurements of a 
$-value, which is defined as 



k B T\n(k' f /k f ) 
AG' - AG 



(7) 



where kf is the folding rate of the native protein and AG 
is its stability, and k'f and AG' are the corresponding 
quantities for the mutant protein. 

Since the minimal structural units in our model are 
clusters of contacts, we do not calculate $-values for 
single-residue mutations. Rather, we consider whole he- 
lices and strands as units. To compare with experi- 
ments, we average the experimental $-values over all the 
residues composing a given secondary structural element. 

To calculate average $- values for secondary structures, 
we consider 'mutations' that change the free energy fi of 
a contact cluster according to 



A/i(i) 



u 3 l 



(8) 



where Xj i IS the fraction of residues of the secondary 
structural element j that are involved in contacts of the 
cluster i, and e is a small energy. For example, if the sec- 
ondary structural element j contains mi residues, and 
77i2 < mi of these residues appear in contacts of the 
cluster i, we have Xji — 7772 /m\. Note that < Xji < 1, 
where the value xji — 1 is obtained if the whole secondary 
structural element j has contacts in cluster i. Thus the 
<I>-value for the secondary structural element j is given 
by eq. Q with 



Hk' f /k f ) = HX'jXr) 



(9) 



where A^ is the smallest nonzero eigenvalue of the mutant 
with cluster free energies + Afi(j), and 



AG'-AG = J2 A fiU) 



(10) 



For e <C k B T, we find that the calculated values are 
nearly independent of e. We choose here e — 0.01 k B T . 

Predicted $-values are compared with experiments in 
Fig- El The theoretical ^-values were calculated with the 
same parameters for all four proteins (see figure caption). 
The predicted values agree well with the experimental 
values. This comparison indicates that the folding ki- 
netics of these proteins is dominated by generic features 
of the fold topology, rather than by the specific energetic 
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FIG. 5: Theoretical and average experimental $-value distributions for the secondary structural elements of CI2, the src SH3 
domain, TNfn3, and protein L. The parameter of the loop closure term is c = 0.5, and the free energy of the local clusters is 
fi — 0. The free energy /„; of the nonlocal clusters is chosen so that the probability that all nonlocal clusters are formed is 0.9 
in equilibrium. 



details - i.e., which residues form contacts, how much hy- 
drogen bonds or hydrophobic interactions are worth, the 
details of sidechain packing, etc. In the case of protein 
G (see Fig. 01 , the experimental <&- value distribution is 
largely reproduced by making the additional assumption 
that the a-helix cluster has a free energy = —2.0 fcgT, 
rather than the value fi = ksT that we have other- 
wise used for local clusters (see Fig. EJ- However, even 
without changing this parameter, the $-value distribu- 
tion reflects the features of the experimental distribution 
that the <£>-values for the strands A and A are larger 
than those for A an d A- 




A A a A A 

FIG. 6: Comparison of theoretical and experimental (Fvalue 
distributions. (Light grey): theoretical $-values for the same 
parameters as in Fig. (Black): average experimental $- 
values. (Dark grey): theoretical $-values when assuming that 
the free energy of the a-helix cluster is fi — — 0.5&bT, deviat- 
ing from the standard value /( = 1.5fcsT for the local clusters. 



IV. CONCLUSIONS 

We have developed a simple model of the folding kinet- 
ics of two-state proteins. The model aims to predict the 
folding rates of the fast and slow processes, the folding 
routes, and <I>-values for a protein, if the native struc- 
ture is given. The dominant folding routes are found to 
be those having small ECOs, i.e., steps that involve only 
small 'loop closures'. The model parameters include: c, 
an intrinsic free energy for loop closure; //, the free en- 
ergy for propagating contacts in local structures; and f n i, 
the free energy for propagating nonlocal contacts. The 
model predicts that the barrier to two-state folding is 
the formation of local structural elements like helices and 
hairpins, and that the steps involving their assembly into 
larger and more native-like structure are downhill in free 
energy. 
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